Cross-document Coreference for WePS
نویسندگان
چکیده
A good clustering performance depends on the quality of the distance function used to asses similarity. In this paper we propose a pairwise document coreference model to improve performance over a wordvector similarity approach for the WePS 3 clustering task. We identify a simple criterion which discriminates between highly ambiguous queries, i.e. many small clusters, and balanced queries, i.e. fewer, larger clusters. A document clustering framework was developed facilitating direct comparison between different parameters, features and algorithms. It uses a unified feature representation to afford a wide variety of clustering pipelines. Using the predicted coreference likelihood and a simple clustering algorithm, we achieve comparable results on the WePS 2 dataset, and competitive performance on the WePS 3 dataset.
منابع مشابه
Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using document-level categories, sub-document level context and extracted entities and relations for the CDC task. We train a categorization component with an efficient flat algorithm using thousands of ODP categories and over...
متن کاملSolving the "Who's Mark Johnson Puzzle": Information Extraction Based Cross Document Coreference
Cross Document Coreference (CDC) is the problem of resolving the underlying identity of entities across multiple documents and is a major step for document understanding. We develop a framework to efficiently determine the identity of a person based on extracted information, which includes unary properties such as gender and title, as well as binary relationships with other named entities such ...
متن کاملA Methodology for Cross-Document Coreference
Amit Bagga General Electric CRD, PO Box 8 Schenectady, NY 12301 [email protected] Alan W. Biermann Dept. of Computer Science Duke University Durham, NC 27708 [email protected] Cross-document coreference occurs when the same person, place, event, or concept is discussed in more than one text source. Computer recognition of this phenomenon is important because it helps break \the document boundary" ...
متن کاملName Perplexity
The accuracy of a Cross Document Coreference system depends on the amount of context available, which is a parameter that varies greatly from corpora to corpora. This paper presents a statistical model for computing name perplexity classes. For each perplexity class, the prior probability of coreference is estimated. The amount of context required for coreference is controlled by the prior core...
متن کاملHow Much Processing Is Required for Cross-Document Coreference?
Cross-document coreference occurs when the same person, place, event, or concept is discussed in more than one text source. Computer recognition of this phenomenon is important because it helps break \the document boundary" by allowing a user to examine information about a particular entity from multiple text sources at the same time. Cross-document coref-erence was identiied as one of the pote...
متن کامل